Status of LedgerHub, Releases, Installation, Documentation, Importers,... how can I get started?

521 views
Skip to first unread message

johan...@gmail.com

unread,
Oct 7, 2014, 3:34:39 PM10/7/14
to bean...@googlegroups.com
Hi!

I am interested in using LedgerHub for regularly pulling in online banking transactions. Many questions come to my mind, which I did not find answered on beancounts homepage [1], design doc [2], and source code [3]:

* Current Status: What is the status of LedgerHub? Do you rather consider it to be experimental, or ready to be used in some sense?

* Version / Releases: I cannot find any release of ledgerhub, nor any release branch or tags in the mercurial repo. Are users intended to run the latest version from the repo? From a user's perspective, I would prefer to see versioned releases, ideally packaged and registered as a python package so I can simply run "pip install ledgerhub".

* Installation: The "TODO" file mentions a workarounds for installing libmagic on Mac OS X. Is this workaround still required?

* Documentation, How to Get Started: How can I start using LedgerHub to import files that I downloaded from my bank? Judging from the source code, there seem to be two possible ways: I found the notion of import scripts, and I found commandline APIs. Are these equivalent ways of using ledgerhub? Note: If LedgerHub is ready for use, I am willing to try it out and to help improve documentation along the way.

* Importers: I was surprised to find a number of bank-specific importers in the "lib/python/ledgerhub/importers" directory. Instead, I would have expected general-purpose solutions for importing CSV, OFX etc, and only lightweight configurations for specific banks. So, in order to configure LedgerHub for the CSV files that I download from my banks would I also have to write - similarly complicated - importers? (Given my limited python skills, I tend to think this would not be worth the effort).

* Directory structure: Where should the source files (e.g., CSV) and the ledgerhub output (e.g., *.beancount, *.ledger) be saved? Are there any best practices / recommendations regarding the directory structure? Should I append the output to a main ledger file, or rather import it using an import directive?

* Duplicates: The ledgerhub design doc [1] mentions ledgerhub's capability of identifying duplicate transactions. How do I tell ledgerhub about pre-existing transactions?

* Ledger-Cli vs. Beancount Syntax: Can ledgerhub write (when importing) and read (for duplicate detection) both ledger and beancount syntax? Is the beancount syntax more stable / better supported than the ledger syntax, suggesting I should switch from ledger to beancount in order to use ledgerhub?

...many questions by a novice user ;-)
thank you for your attention and consideration!

Johannes


[1] http://furius.ca/ledgerhub/

[2] https://docs.google.com/document/d/11u1sWv7H7Ykbc7ayS4M9V3yKqcuTY7LJ3n1tgnEN2Hk/edit

[3] https://hg.furius.ca/public/ledgerhub/file/51a415845777/doc

Martin Blais

unread,
Oct 8, 2014, 12:36:42 AM10/8/14
to johan...@gmail.com, bean...@googlegroups.com
On Tue, Oct 7, 2014 at 2:34 PM, <johan...@gmail.com> wrote:
Hi!

I am interested in using LedgerHub for regularly pulling in online banking transactions. Many questions come to my mind, which I did not find answered on beancounts homepage [1], design doc [2], and source code [3]:

* Current Status: What is the status of LedgerHub?

Not released, but working enough that I'm using it myself every week and it works.
A few rough edges here and there.


 
Do you rather consider it to be experimental, or ready to be used in some sense?

You can begin using it if you want, unfortunately it's missing some good example configurations.

In order to build some example configurations, I had to have some good example data in Beancount.
I'm working on an example generator for Beancount every day now -- I'm practically done, just needs minor cleanup at this point. It's called "bean-example". Try it.

Then I'll build some example import configurations matching this example data file, along with some sample input files.
With these configurations you should be able to get a good idea and build your own and start using it.


* Version / Releases: I cannot find any release of ledgerhub, nor any release branch or tags in the mercurial repo. Are users intended to run the latest version from the repo? From a user's perspective, I would prefer to see versioned releases, ideally packaged and registered as a python package so I can simply run "pip install ledgerhub".

At the moment just work off of HEAD. There are no releases yet.
You can't run pip on it (this isn't released yet); 
hg clone, update and then python setup.py install is the way for now.



* Installation: The "TODO" file mentions a workarounds for installing libmagic on Mac OS X. Is this workaround still required?

I'm working on a Linux machine and haven't done much testing on Mac. Although LedgerHub is designed to work in the presence of a varying set of supporting command line tools - not everyone will have the same PDF conversion tools on their machines, for instance - it's probably best to assume you may have to do a bit of of troubleshooting and debugging on a platform that is less tested.

 

* Documentation, How to Get Started: How can I start using LedgerHub to import files that I downloaded from my bank? Judging from the source code, there seem to be two possible ways: I found the notion of import scripts, and I found commandline APIs. Are these equivalent ways of using ledgerhub? Note: If LedgerHub is ready for use, I am willing to try it out and to help improve documentation along the way.

You have to write a simple configuration file that provides patterns to match on your downloaded files and corresponding instances of importer objects, objects which need to be initialized with suitable configuration for your accounts. Then you can use the various command-line tools using that configuration file.

 

* Importers: I was surprised to find a number of bank-specific importers in the "lib/python/ledgerhub/importers" directory. Instead, I would have expected general-purpose solutions for importing CSV, OFX etc, and only lightweight configurations for specific banks.

The world just isn't that simple. I use vanilla OFX whenever I can, but even that has complications. CSV files vary quite a bit too, and not just in their columns, but in the interpretations of their columns.

 
So, in order to configure LedgerHub for the CSV files that I download from my banks would I also have to write - similarly complicated - importers? (Given my limited python skills, I tend to think this would not be worth the effort).

Yes. But I find most importers to be relatively simple.

Note that there's really no reason someone could not write a generic CSV importer that you can configure with a bit of information about the column's semantics, like Reckon does. This method does, however, assume a lot of things about the input files and it certainly would not support many of the CSV files you can download. For example, some platforms output a CSV file, but the numbers have to be treated differently depending on the value of some other column. It's not realistic to assume that all CSV files follow a relatively simple semantic - some custom importers will be required.

In any case, writing importers that work well for the institutions that you use is something you just have to do. If that's too much, then LedgerHub is not for you. However... if that's too much, I don't know what is either. It's just the nature of the problem you're dealing with: input files are messy, there's no way around it.  On the other hand, if that's _not_ too much, then you can leverage the tools and the methodology for importing.



* Directory structure: Where should the source files (e.g., CSV) and the ledgerhub output (e.g., *.beancount, *.ledger) be saved? Are there any best practices / recommendations regarding the directory structure? Should I append the output to a main ledger file, or rather import it using an import directive?

I just download the files to my ~/Downloads folder without even renaming them and then I use ledgerhub-extract to get the Beancount transaction info out of them. Then I use ledgerhub-file to move the files to a repository where I store and control all my files. The directory hierarchy mirrors that of accounts.


* Duplicates: The ledgerhub design doc [1] mentions ledgerhub's capability of identifying duplicate transactions. How do I tell ledgerhub about pre-existing transactions?

This is currently not working. I disabled it a little while back and I need to bring it back in.

(You would tell ledgerhub about the pre-existing transactions by providing your current beancount input file. It just knows to fuzzy compare the previously imported transactions with the newly imported ones.)


* Ledger-Cli vs. Beancount Syntax: Can ledgerhub write (when importing) and read (for duplicate detection) both ledger and beancount syntax? Is the beancount syntax more stable / better supported than the ledger syntax, suggesting I should switch from ledger to beancount in order to use ledgerhub?

At the moment there is no Ledger support. Output should be easy. Input would require Python bindings, ideally, but I haven't been able to build those easily on Mac nor Linux so far, so I haven't worked onit.


--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/f299b6bd-a5c4-49ef-bcf2-95dfd00b0d0d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

johan...@gmail.com

unread,
Oct 8, 2014, 2:03:27 PM10/8/14
to bean...@googlegroups.com, johan...@gmail.com
Thank very much for your detailed response! I'll start poking around to try things out.
Johannes

johan...@gmail.com

unread,
Oct 8, 2014, 3:40:49 PM10/8/14
to bean...@googlegroups.com, johan...@gmail.com
I was successfull in installing ledgerhub, so I can now execute the ledgerhub* scripts. (They print help messages when I invoke them).

I made some small improvements to the setup.py script. Feel free to pull them from:
https://bitbucket.org/johannesjh/ledgerhub

But I am having a hard time writing an import configuration. I tried to use and/or adapt the example file provided in "examples/importing.import". But the file only gives me error messages stating that python cannot resolve some of the imports: E.g., line 11: "from ledgerhub.driver import import_main", and in line 18: "from ledgerhub.importers.generic import ofx". ...are those import statements correct? I cannot find any definitions of import_main in the entire project.

thank you,
Johannes

Martin Blais

unread,
Oct 11, 2014, 3:07:10 PM10/11/14
to johan...@gmail.com, bean...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.

Martin Blais

unread,
Oct 11, 2014, 3:07:17 PM10/11/14
to johan...@gmail.com, bean...@googlegroups.com
On Wed, Oct 8, 2014 at 3:40 PM, <johan...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.

Martin Blais

unread,
Oct 11, 2014, 3:08:01 PM10/11/14
to johan...@gmail.com, bean...@googlegroups.com
On Wed, Oct 8, 2014 at 3:40 PM, <johan...@gmail.com> wrote:
I was successfull in installing ledgerhub, so I can now execute the ledgerhub* scripts. (They print help messages when I invoke them).

I made some small improvements to the setup.py script. Feel free to pull them from:
https://bitbucket.org/johannesjh/ledgerhub

Done! Thanks.


But I am having a hard time writing an import configuration. I tried to use and/or adapt the example file provided in "examples/importing.import". But the file only gives me error messages stating that python cannot resolve some of the imports: E.g., line 11: "from ledgerhub.driver import import_main", and in line 18: "from ledgerhub.importers.generic import ofx". ...are those import statements correct? I cannot find any definitions of import_main in the entire project.

The import file needed to be updated. Update and try again.

I need to add it being tested in the unit test.

Thank you,




 

thank you,
Johannes

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.

johan...@gmail.com

unread,
Oct 15, 2014, 5:12:10 AM10/15/14
to bean...@googlegroups.com, johan...@gmail.com
Thank you for updating the example script!

1. Improved Setup Script

I changed the setup.py script to use setuptools instead of distutils because I still got import-not-found errors (which appear to be fixed by the new setup.py script). Feel free to pull from:
https://bitbucket.org/johannesjh/ledgerhub
Note that the new setup.py script obsoletes the scripts in the "bin" folder. I guess we can delete them now?


2. MIME-type Problems

Using the new setup script, I can now successfully run the example import scripts (both "examples/importing.import" and "examples/other/example.import"). But MIME-type detection does not seem to work properly, so the file to be imported is not found. (Possibly and probably because I am working on MacOS instead of on Linux).

$ python3
Python 3.4.1 (default, Aug 24 2014, 21:32:40)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import magic
>>> magic.from_file("examples/Downloads/ofx.qfx", mime=True)
b'text/plain'

...the above output shows that qfx files are recognized as b'text/plain' by python_magic.

Any idea how I could fix this?

Note: There seem to be ways of teaching MacOS about a new Mime Type. E.g., see
http://superuser.com/questions/421792/how-to-associate-mime-type-with-a-handler-in-os-x
Using the above answer and tool, I can configure which application opens which mime type, but I cannot associate a file extension with a mime type.

=> I guess I should rather rely on file extensions (instead of MIME-types) in ledgerhub's import script?

thx,
Johannes

Martin Blais

unread,
Oct 15, 2014, 11:31:35 AM10/15/14
to johan...@gmail.com, bean...@googlegroups.com
On Wed, Oct 15, 2014 at 5:12 AM, <johan...@gmail.com> wrote:
Thank you for updating the example script!

1. Improved Setup Script

I changed the setup.py script to use setuptools instead of distutils because I still got import-not-found errors (which appear to be fixed by the new setup.py script). Feel free to pull from:
https://bitbucket.org/johannesjh/ledgerhub
Note that the new setup.py script obsoletes the scripts in the "bin" folder. I guess we can delete them now?

I don't think that's a good idea... I went with the least common denominator to avoid the setuptools dependency.
Can you detail what was failing?

About removing the bin scripts: I don't understand why you do this. Why can't you just copy the files I provide under bin/?



2. MIME-type Problems

Using the new setup script, I can now successfully run the example import scripts (both "examples/importing.import" and "examples/other/example.import"). But MIME-type detection does not seem to work properly, so the file to be imported is not found. (Possibly and probably because I am working on MacOS instead of on Linux).

$ python3
Python 3.4.1 (default, Aug 24 2014, 21:32:40)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import magic
>>> magic.from_file("examples/Downloads/ofx.qfx", mime=True)
b'text/plain'

...the above output shows that qfx files are recognized as b'text/plain' by python_magic.

Any idea how I could fix this?

Magic is imperfect magic, for sure.

You can just use some other pattern to match the file against your configuration, nothing forces you to rely on the mime type. Use ledger-match-text to view the file that is being matched against and finding some regular expression unique to that file should do the trick. Don't try to make it perfect--just try to find a pattern that makes it work. Importing is a dirty job...



Note: There seem to be ways of teaching MacOS about a new Mime Type. E.g., see
http://superuser.com/questions/421792/how-to-associate-mime-type-with-a-handler-in-os-x
Using the above answer and tool, I can configure which application opens which mime type, but I cannot associate a file extension with a mime type.

=> I guess I should rather rely on file extensions (instead of MIME-types) in ledgerhub's import script?

You can do that. It's part of the preamble that LedgerHub prepends to the match text.




 

thx,
Johannes

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.

johan...@gmail.com

unread,
Oct 15, 2014, 12:39:37 PM10/15/14
to bean...@googlegroups.com, johan...@gmail.com
I changed the setup.py script to use setuptools instead of distutils because I still got import-not-found errors (which appear to be fixed by the new setup.py script). Feel free to pull from:
https://bitbucket.org/johannesjh/ledgerhub
Note that the new setup.py script obsoletes the scripts in the "bin" folder. I guess we can delete them now?

I don't think that's a good idea... I went with the least common denominator to avoid the setuptools dependency.
Can you detail what was failing?

Ok, I reproduced the problem. I uninstalled all ledgerhub files, checked out the old source, and ran "python3 setup.py install".
Then, running ./examples/importing.import then gives me:

Traceback (most recent call last):
  File "./examples/importing.import", line 18, in <module>
    from ledgerhub.importers.generic import ofx
ImportError: No module named 'ledgerhub.importers.generic'

...last time I saw this error, I guessed that maybe the setup.py script does not properly install the required module. So I started looking into recommendations regarding setup.py scripts. I found that the recommended way seems to be to use setuptools instead of distutils, e.g., see the python packaging user guide [1] and the pypa example project [2]. Since using setuptools solved my problem, I was happy to use it.

[1] https://python-packaging-user-guide.readthedocs.org/en/latest/distributing.html
[2] https://github.com/pypa/sampleproject
 

About removing the bin scripts: I don't understand why you do this. Why can't you just copy the files I provide under bin/?

Well, since setuptools automatically creates such scripts, there is no more need to code them manually.

Regarding the MIME-type problems, I'll opt for a quick and dirty solution. :-)

Martin Blais

unread,
Oct 16, 2014, 10:54:11 AM10/16/14
to bean...@googlegroups.com
On Wed, Oct 15, 2014 at 12:39 PM, <johan...@gmail.com> wrote:
I changed the setup.py script to use setuptools instead of distutils because I still got import-not-found errors (which appear to be fixed by the new setup.py script). Feel free to pull from:
https://bitbucket.org/johannesjh/ledgerhub
Note that the new setup.py script obsoletes the scripts in the "bin" folder. I guess we can delete them now?

I don't think that's a good idea... I went with the least common denominator to avoid the setuptools dependency.
Can you detail what was failing?

Ok, I reproduced the problem. I uninstalled all ledgerhub files, checked out the old source, and ran "python3 setup.py install".
Then, running ./examples/importing.import then gives me:

Traceback (most recent call last):
  File "./examples/importing.import", line 18, in <module>
    from ledgerhub.importers.generic import ofx
ImportError: No module named 'ledgerhub.importers.generic'

...last time I saw this error, I guessed that maybe the setup.py script does not properly install the required module. So I started looking into recommendations regarding setup.py scripts. I found that the recommended way seems to be to use setuptools instead of distutils, e.g., see the python packaging user guide [1] and the pypa example project [2]. Since using setuptools solved my problem, I was happy to use it. 

Thanks Johannes,
I fixed the original setup.py.
I'll have to create a unit test for this going forward, or automate the finding of the libraries (not hard, will remove annoying out-of-sync mistakes like this).

I'm not going to switch to setuptools right away, but I'll have a look at it this weekend, if it does not add burden to the user I'll consider switching to it.


Reply all
Reply to author
Forward
0 new messages